An R Package for Accommodating Islands and Disjoint Zones in Areal Spatial Modelling

useR! Conference 2024, Salzburg, July 9

Kevin Horan, Maynooth University

Motivation

Create neighbourhood structure for England and Wales parliamentary constituencies

Motivation

Create neighbourhood structure for England and Wales parliamentary constituencies

Motivation

Include Scotland

Motivation

Creating neighbourhood structure

  • spdep::poly2nb()
  • sfdep::st_contiguity()
  • necessary for
    • exploratory tasks such e.g. LISA, or
    • modelling spatial effects e.g. ICAR
  • inexperienced users seeking help on stackoverflow etc
  • even for experienced users, often want to have a visual component to creating these structures

Solutions

  • remove islands from the model
  • manually seek them out and link them
  • set buffer
  • maybe take an entirely different approach based on domain knowledge

sfislands Proposition

Package which encourages a workflow

  1. pre-functions

    • for setting up neighbourhood structure
  1. fed directly into model
  1. post-functions

    • for visualising model predictions

Pre-functions

function purpose
st_bridges() create a neighbourhood contiguity structure, with a k-nearest neighbours condition for islands
st_quickmap_nb() check structure visually on map
st_check_islands() check and record the contiguities which have been assigned to islands
st_manual_join_nb() make manual changes by adding connections
st_manual_cut_nb() make manual changes by removing connections

Simple scenario

1. st_bridges()

  • input: sf dataframe
  • output: sf dataframe with additional nb column containing named list of contiguities, with islands joined to k nearest neighbours
  • additional arguments available for k, nb column type, attachment to dataframe, exclusion of islands

1. st_bridges()

Default: outputs original sf dataframe with additional named list column nb

st_bridges(rectangles, 
           "name") |> 
  head()
Simple feature collection with 5 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 0 ymin: 0 xmax: 6 ymax: 4
CRS:           NA
   name      nb                       geometry
1 Rect1    2, 3 POLYGON ((0 0, 0 2, 2 2, 2 ...
2 Rect2 1, 3, 4 POLYGON ((2 0, 2 2, 4 2, 4 ...
3 Rect3 1, 2, 5 POLYGON ((2 2, 2 4, 4 4, 4 ...
4 Rect4       2 POLYGON ((5 0, 5 1, 6 1, 6 ...
5 Rect5       3 POLYGON ((0.8 3, 0.8 4, 1.8...

1. st_bridges()

Alternatively, a matrix column

st_bridges(rectangles, 
           "name", 
           nb_structure = "matrix") |> 
  head()
Simple feature collection with 5 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 0 ymin: 0 xmax: 6 ymax: 4
CRS:           NA
   name nb.1 nb.2 nb.3 nb.4 nb.5                       geometry
1 Rect1    0    1    1    0    0 POLYGON ((0 0, 0 2, 2 2, 2 ...
2 Rect2    1    0    1    1    0 POLYGON ((2 0, 2 2, 4 2, 4 ...
3 Rect3    1    1    0    0    1 POLYGON ((2 2, 2 4, 4 4, 4 ...
4 Rect4    0    1    0    0    0 POLYGON ((5 0, 5 1, 6 1, 6 ...
5 Rect5    0    0    1    0    0 POLYGON ((0.8 3, 0.8 4, 1.8...

1. st_bridges()

Other variations (e.g. not in dataframe, remove islands)

st_bridges(
      rectangles, 
      "name", 
      remove_islands = T, 
      nb_structure = "list", 
      add_to_dataframe = F) |>
head()
$Rect1
[1] 2 3

$Rect2
[1] 1 3

$Rect3
[1] 1 2
st_bridges(
      rectangles, 
      "name", 
      remove_islands = T, 
      nb_structure = "matrix", 
      add_to_dataframe = F) |> 
head()
      [,1] [,2] [,3]
Rect1    0    1    1
Rect2    1    0    1
Rect3    1    1    0

2. st_quickmap_nb()

Visualise structure quickly

st_bridges(
      rectangles, 
      "name", 
      link_islands_k = 1) |> 
st_quickmap_nb()

st_bridges(
      rectangles, 
      "name", 
      link_islands_k = 2) |> 
st_quickmap_nb()

2. st_quickmap_nb()

Options: numeric nodes

st_bridges(rectangles, 
            "name", 
            link_islands_k = 1) |> 
st_quickmap_nb(nodes = "numeric")

2. st_quickmap_nb()

Options: styling

st_bridges(rectangles, "name", link_islands_k = 1) |> 
  st_quickmap_nb(linkcol = "orange", bordercol = "darkblue",
                 pointcol = "pink", linksize = 0.5,
                 pointsize = 6, fillcol = "red",
                 bordersize = 1)

3. st_check_islands()

For transparency, shows summary of non-contiguous connections in a dataframe

st_bridges(
      rectangles, 
      "name", 
      link_islands_k = 1) |> 
st_check_islands()
island_names island_num nb_num nb_names
Rect4 4 2 Rect2
Rect5 5 3 Rect3
st_bridges(
      rectangles, 
      "name", 
      link_islands_k = 2) |> 
st_check_islands()
island_names island_num nb_num nb_names
Rect4 4 2 Rect2
Rect4 4 3 Rect3
Rect5 5 1 Rect1
Rect5 5 3 Rect3

4. st_manual_join_nb()

If we feel that 4 should also be connected to 3, this can be done manually.

st_bridges(
        rectangles, 
        "name") |> 
st_quickmap_nb(
    nodes = "numeric")

st_bridges(
        rectangles, 
        "name") |> 
st_manual_join_nb(3,4) |> 
st_quickmap_nb(
        nodes = "numeric")

5. st_manual_cut_nb()

And perhaps there is a desert between rectangles 1 and 2 which justifies removing the connection. We will edit it this time using names.

st_bridges(
      rectangles, 
      "name") |> 
st_quickmap_nb(
    nodes = "numeric")

st_bridges(rectangles, "name") |> 
  st_manual_join_nb(3,4) |> 
  st_manual_cut_nb("Rect1",
                    "Rect2") |> 
  st_quickmap_nb(
      nodes = "numeric")

Model

This object, of class nb, can be used for brms, INLA, rstan etc.

  • post-functions focus on mgcv models
    • flexible and versatile
    • extraction of spatial predictions can be cumbersome

Post-functions

function purpose
st_augment() augment the original dataframe with model predictions
st_quickmap_preds() generate quick maps of these predictions

st_augment() works with:

  • random effects (which are called with bs='re'), and
  • ICAR components (bs='mrf')

Creating an mgcv model

mgcv::gam(
# fixed intercept and effect for covariate  
            y ~ covariate + 
# random intercept at level region
            s(region, bs = "re") +
# random slopes at level region
            s(region, covariate, bs = "re") +
# ICAR varying intercept at level sub-region
            s(sub-region, 
              bs = 'mrf',
              xt = list(nb = data$nb)) +
# ICAR varying slope for covariate at level sub-region
            s(sub-region, by = covariate, 
              bs = 'mrf',
              xt = list(nb = data$nb)),
  data = data, 
  method = "REML")

Example 1: Indonesia earthquakes

Motivation:

  • Indonesia: a place with many islands
  • earthquakes: a phenomenon which does not respect land contiguity

Also:

  • not familiar with names

Earthquakes in Indonesia of magnitude > 5.5, 1985-2023. Categorised by magnitude as medium, large or extra-large.

Earthquake incidence in Indonesia, 1985-2023, mag > 5.5: count per square kilometre by province.

Indonesia faults. Surrounded by a 10 kilometre buffer.

Indonesia fault concentration. Square kilometre of buffered fault per square kilometre of province area.

Step 1. Pre-functions

Simple default structure and map

st_bridges(provinces_df, "province", 
            link_islands_k = 2) |> 
st_quickmap_nb(fillcol = "antiquewhite1", bordercol = "black", 
                bordersize = 0.5, linkcol = "darkblue", 
                linksize = 0.8, pointcol = "red", 
                pointsize = 2) + 
theme(panel.background = element_rect(fill = "#ECF6F7", 
                                        colour = "black", 
                                        linewidth=1.5),
      axis.text = element_blank()) +
geom_sf(data=nearby_countries_df, fill="gray50", 
        linewidth=0.5, colour="black")

Step 1. Pre-functions

Simple default structure and map

Step 1. Pre-functions

st_bridges(provinces_df, "province", link_islands_k = 2) |>
  st_check_islands()
island_names island_num nb_num nb_names
Bali 2 12 Jawa Timur
Bali 2 21 Nusa Tenggara Barat
Bangka-Belitung 3 9 Jambi
Bangka-Belitung 3 31 Sumatera Selatan
Kepulauan Riau 17 9 Jambi
Kepulauan Riau 17 24 Riau
Maluku 19 20 Maluku Utara
Maluku 19 22 Nusa Tenggara Timur
Maluku Utara 20 19 Maluku
Maluku Utara 20 27 Sulawesi Tengah
Nusa Tenggara Barat 21 2 Bali
Nusa Tenggara Barat 21 22 Nusa Tenggara Timur
Nusa Tenggara Timur 22 19 Maluku
Nusa Tenggara Timur 22 21 Nusa Tenggara Barat

Step 1. Pre-functions

  • numeric - don’t know names
  • concavehull - presence of multipolygons
st_bridges(provinces_df, "province", 
            link_islands_k = 2) |> 
st_quickmap_nb(fillcol = "antiquewhite1", bordercol = "black",
                bordersize = 0.5, linkcol = "tomato", linksize = 0.5,
                numericcol = "black",  numericsize = 6, 
                nodes = "numeric", 
                concavehull = TRUE, hullcol = "darkgreen", 
                hullsize = 0.4) + 
theme(panel.background = element_rect(fill = "#ECF6F7", 
                                      colour = "black", 
                                      linewidth=1.5),
      axis.text = element_blank())

Step 1. Pre-functions

  • numeric - don’t know names
  • concavehull - presence of multipolyons

Step 1. Pre-functions

Make manual cut/join alterations

st_bridges(provinces_df, "province", link_islands_k = 2) |> 
  st_manual_join_nb(3,13) |> 
  st_manual_join_nb(13,17) |> 
  st_manual_join_nb(14,25) |> 
  st_manual_join_nb(20,29) |>
  st_manual_join_nb(19,23) |> 
  st_manual_join_nb(16,27) |> 
  st_manual_join_nb(22,23) |> 
  st_manual_join_nb(7,19) |> 
  st_manual_join_nb(7,20) |>
  st_manual_join_nb(19,28) |> 
  st_manual_join_nb(4,18) |> 
  st_manual_join_nb(21,26) |> 
  st_manual_join_nb(22,28) |> 
  st_manual_cut_nb(19,22) |> 
  st_quickmap_nb()

Step 1. Pre-functions

Result of manual cut/join alterations

Step 2. Model

mod_pois_mrf <- gam(damaging_quakes_total ~ 
                      fault_concentration +
                      s(province, 
                        bs='mrf', xt=list(nb=prep_data$nb), k=24) +
                      offset(log(area_province)),
                data=prep_data, 
                method="REML",
                family = "poisson")

Step 3. Post-functions

plot_mrf <- mod_pois_mrf |> 
  st_augment(prep_data) |>
  st_quickmap_preds(scale_low = "darkgreen", scale_mid = "ivory", 
                    scale_high = "darkred", scale_midpoint = 0)

Workflow summary

sf dataframe

x
province
province_id
quake_density
damaging_quakes_total
damaging_quakes_density
area_fault_within
area_province
fault_concentration
geometry

|> st_bridges()

x
province
province_id
quake_density
damaging_quakes_total
damaging_quakes_density
area_fault_within
area_province
fault_concentration
nb
geometry

|> st_augment()

x
province
province_id
quake_density
damaging_quakes_total
damaging_quakes_density
area_fault_within
area_province
fault_concentration
nb
mrf.smooth.province
se.mrf.smooth.province
geometry

Workflow summary

# 1. set up neighbourhood structure
prep_data <- st_bridges(provinces_df, "province")

# 2. check and edit if necessary using 
#    st_quickmap_preds(), st_manual_join_nb(), st_manual_cut_nb()

# 3. define model
mod <- gam(quake_mlxl_total ~ fault_concentration +
                s(province, bs='mrf', xt=list(nb=prep_data$nb)) +
                offset(log(area_province)),
        data=prep_data, method="REML",family = "poisson")

# 4. augment tidy estimates
tidy_ests <- st_augment(mod, prep_data)

# 5. visualise them
st_quickmap_preds(tidy_ests)

Example 2: London

Motivation:

  • no islands, but
  • at smaller scale, e.g. city, geographical barriers such as rivers may be relevant
  • river Thames flows through London

River and contiguities

st_bridges(london, "NAME") |> 
  st_quickmap_nb(bordercol="black", bordersize=0.5, linkcol="blue") + 
  geom_sf(data=thames, colour="darkblue", linewidth=1.5) + 
  theme(panel.background = element_rect(
                    fill="#F6F3E9", colour="black", linewidth=1.5))

Riverside units only

st_bridges(london[riverside,],"NAME") |> 
  st_quickmap_nb(linksize = 0.5, bordercol = "black", 
                  bordersize=0.5,, linkcol="blue") +
  geom_sf(data=thames, colour="darkblue", linewidth=1.5) + 
  annotation_scale(location="br") + coord_sf(datum=NA) + 
  theme(panel.background = element_rect(
                    fill="#F6F3E9", colour="black", linewidth=1.5))

River crossings

1 kilometre buffer around crossings shaded green.

Visualise: st_quickmap_nb(nodes = “numeric”)

Boroughs which are not within 1 kilometre of a crossing are shaded pink.

Edit: st_manual_cut_nb()

# manually cut the links where there is no crossing

st_bridges(london[riverside,], "NAME") |> 
  st_manual_cut_nb(18,17) |> 
  st_manual_cut_nb(19,17) |> 
  st_manual_cut_nb(19,20) |> 
  st_manual_cut_nb(20,21) |> 
  st_manual_cut_nb(21,22) |> 
  st_manual_cut_nb(22,48) |> 
  st_manual_cut_nb(47,48) |> 
  st_manual_cut_nb(45,46) |> 
  st_manual_cut_nb(45,47) |>  
  st_manual_cut_nb(39,65) |> 
  st_manual_cut_nb(1,2) |> 
  st_quickmap_nb(bordercol = "black", bordersize = 0.5) +
  geom_sf(data=no_touch_buffer, fill = "pink", alpha = 0.3)

Riverside boroughs of Greater London. Contiguities across the river have been cut for the pink boroughs.

Example 3: Liverpool

Motivation:

  • include spatial hierarchical structure
  • combined with ICAR
  • based on ONS data used in multilevel spatial modelling course at University of Liverpool

Nested administrative divisions

code area type description
OA output areas lowest level of geographical area for census statistics, usually containing 100 - 625 persons
LSOA lower layer super output areas usually 4 or 5 OAs
MSOA middle layer super output areas usually 4 or 5 LSOAs

Unemployment and illness

OA level neighbours
st_bridges(liverpool, "oa_cd") |>
  st_quickmap_nb(pointsize = 0.1, linksize = 0.1, fillcol = "gray90")
Pre-functions and model
prepliv <- st_bridges(liverpool, 
                      "oa_cd")

liverpool_model <-
# fixed intercept and slope
  gam(unemployment ~ lt_illness +
# MSOA level random intercept
    s(msoa_cd, bs="re") +
# MSOA level random slope
    s(msoa_cd, lt_illness, bs="re") +
# LSOA level random intercept
    s(lsoa_cd, bs="re") +
# LSOA level random slope
    s(lsoa_cd, lt_illness, bs="re") +
# OA level ICAR random coefficients
    s(oa_cd, bs='mrf',
      xt=list(nb=prepliv$nb),k=20), 
  data=prepliv, method="REML")
Post-functions
liverpool_plots <- 
  liverpool_model |>
st_augment(prepliv) |>
st_quickmap_preds()

ggarrange(
  plotlist = 
liverpool_plots
)

MSOA level random effects

LSOA level random effects

OA level ICAR

Conclusions

  • removes blockage issue of disconnected units for inexperienced users

  • enables interactive construction of neighbourhood structure

  • allows quick model comparison

  • can be extended to other modelling packages in future

Thank you

Further details and vignettes:

https://horankev.github.io/sfislands

Contact:

Thanks to my supervisors:

Chris Brunsdon, Katarina Domijan